Smoke: Fine-grained Lineage at Interactive Speed
نویسندگان
چکیده
Data lineage describes the relationship between individual input and output data items of a workflow, and has served as an integral ingredient for both traditional (e.g., debugging, auditing, data integration, and security) and emergent (e.g., interactive visualizations, iterative analytics, explanations, and cleaning) applications. The core, long-standing problem that lineage systems need to address—and the main focus of this paper—is to capture the relationships between input and output data items across a workflow with the goal to streamline queries over lineage. Unfortunately, current lineage systems either incur high lineage capture overheads, or lineage query processing costs, or both. As a result, applications, that in principle can express their logic declaratively in lineage terms, resort to hand-tuned implementations. To this end, we introduce Smoke, an in-memory database engine that neither lineage capture overhead nor lineage query processing needs to be compromised. To do so, Smoke introduces tight integration of the lineage capture logic into physical database operators; efficient, write-optimized lineage representations for storage; and optimizations when future lineage queries are known up-front. Our experiments on microbenchmarks and realistic workloads show that Smoke reduces the lineage capture overhead and streamlines lineage queries by multiple orders of magnitude compared to state-of-the-art alternatives. Our experiments on real-world applications highlight that Smoke can meet the latency requirements of interactive visualizations (e.g., <150ms) and outperform hand-written implementations of data profiling primitives.
منابع مشابه
Exploring the Use of the Teaching Dimensions Observation Protocol to Develop Fine‐grained Measures of Interactive Teaching in Undergraduate Science Classrooms
to Develop Fine-grained Measures of Interactive Teaching in Undergraduate Science Classrooms (WCER Working Paper 2013-6). Retrieved from University of Wisconsin–Madison, Wisconsin Center for Education Research website: http://www.wcer.wisc.edu/publications/workingPapers/papers.php Exploring the Use of the Teaching Dimensions Observation Protocol to Develop Fine‐grained Measures of Interactive T...
متن کاملUltra-Fine Grained Dual-Phase Steels
This paper provides an overview on obtaining low-carbon ultra-fine grained dual-phase steels through rapid intercritical annealing of cold-rolled sheet as improved materials for automotive applications. A laboratory processing route was designed that involves cold-rolling of a tempered martensite structure followed by a second tempering step to produce a fine grained aggregate of ferrite and ca...
متن کاملStriping for Interactive Video: Is It Worth It?
We study the design of interactive video servers that store videos on disk arrays. In order to avoid the hot–spot problem in video servers it is conventional wisdom to stripe the videos over the disk array using Fine Grained Striping or Coarse Grained Striping techniques. Striping, however, increases the seek and rotational overhead, thereby reducing the throughput of the disk array. Our result...
متن کاملSupporting Fine-grained Data Lineage in a Database Visualization Environment
The lineage of a datum records its processing history. Because such information can be used to trace the source of anomalies and errors in processed data sets, it is valuable to users for a variety of applications including investigation of anomalies and debugging. Traditional data lineage approaches rely on metadata. However, metadata does not scale well to fine-grained lineage, especially in ...
متن کاملFine-grained device management in an interactive media server
The use of interactive media has already gained considerable popularity. Interactivity gives viewers VCR controls like slow-motion, pause, fast-forward, and instant replay. However, traditional server-based or client-based approaches for supporting interactivity either consume too much network bandwidth or require large client buffering; and hence they are economically unattractive. In this pap...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- PVLDB
دوره 11 شماره
صفحات -
تاریخ انتشار 2018